Collection of Internet

home *** CD-ROM | disk | FTP | other *** search

/ Collection of Internet / Collection of Internet.iso / infosrvr / dev / www_talk.930 / 000534_dsr@hplb.hpl.hp.com _Mon Jan 11 11:49:46 1993.msg < prev next >

Wrap

Internet Message Format | 1994-01-24 | 7KB

Return-Path: <dsr@hplb.hpl.hp.com> Received: from dxmint.cern.ch by nxoc01.cern.ch (NeXT-1.0 (From Sendmail 5.52)/NeXT-2.0) id AA13758; Mon, 11 Jan 93 11:49:46 MET Received: by dxmint.cern.ch (5.65/DEC-Ultrix/4.3) id AA21299; Mon, 11 Jan 1993 12:04:47 +0100 Received: from dragget.hpl.hp.com by hplb.hpl.hp.com; Mon, 11 Jan 93 11:01:29 GMT Received: by manuel.hpl.hp.com (16.6/15.6+ISC) id AA23480; Mon, 11 Jan 93 11:05:51 GMT From: Dave_Raggett <dsr@hplb.hpl.hp.com> Message-Id: <9301111105.AA23480@manuel.hpl.hp.com> Subject: Re Re Customer pull on HTTP2 To: www-talk@nxoc01.cern.ch Date: Mon, 11 Jan 93 11:05:48 GMT Cc: dsr@hplb.hpl.hp.com Mailer: Elm [revision: 66.25] Kevin Hoadley says in: >> Caching >>------- >> >> It will be desirable to avoid overloading servers with popular documents by >> supporting a caching scheme at local servers (or even at browsers?). > This as well as caching, replication would be nice. But this is only > practical if resource identifiers do not contain location information > (otherwise replication is only possible by making all the peer servers to > appear to be one machine, as in the DNS CNAME suggestion I made some time > ago). But if resource identifiers do not contain host information then you > need an external means of determining how to reach the resource. > This is analagous to routing protocols (an address is not a route ...). > Such a system is probably over ambitious for now. I agree, but it is important to keep an eye of where things are going. The ability to replicate documents in this way will depend on name servers e.g. X.500. In the meantime is this necessary? At first, a simple scheme is to send all remote requests via a fast local server. This server checks if this Udi is in its cache, and if not forwards the request to the machine named in the Udi itself. You can extend this to take advantage of several caches, and the work done by ANSA (Advanced Networked Systems Architecture) on trading may be apropriate. ... talking about when to purge the cache > I think this is silly. I haven't changed a document for six months, > therefore it is safe to say that it won't be changed for the next six > months ... Yes, perhaps not one of my best ideas! I think we need some position in between caching docs only for several minutes, and the full replication mechanism used with network news and nntp. One approach is for the server to periodically refresh cache contents. (This is what Lotus Notes does). You set it up to refresh docs at night, or perhaps on a trickle basis in the background. The problem is knowing what the appropriate interval is for each document. The "Expiry:" field or an equivalent "KeepFor:" (time to live) field when present gives an explicit suggestion. My "silly" suggestion was a rule of thumb aimed at allowing the sever to "learn" that some docs don't change much, and so can be refreshed at longer intervals. Another complimentary approach is to provide a machanism whereby a server owning a document informs a list of client servers when it determines that a given document has changed. This is critical for a successful room booking application. For this to work, the protocol needs to include the requests: NOTIFY hplose.hpl.hp.com:8001 ADD Udi RETRY 1000 10 This is used to inform a server (named in the Udi) that another server (IP address: hplose.hpl.hp.com, port 8001) wishes to be informed when this document is changed or deleted. The RETRY parameter is optional and used to determine the notification retry interval in seconds, followed by how many times to try. NOTIFY hplose.hpl.hp.com:8001 REMOVE Udi The reverse operation removing a server from a notification list CHANGED Udi <Doc header> <Doc body> The message sent to servers on the notification list when the specified doc has changed. If the doc has been deleted then the body should be empty. In the case where it is currently impossible to establish a connection with a server on the notification list, the notification should be periodically retried until a suitable timeout period has expired. See earlier RETRY field. You don't need to complicate the http server loop to implement this mechanism as notifications can be handled by a separate program. ... talking about problems with comparing date/time info > This also depends on hosts agreeing on the date. To quote > RFC1128, talking about a 1988 survey of the time/date on > Internet hosts, "... a few had errors as much as two years" Wow! I had no idea that this was the case. I had hoped that all machines would have support for date/time conversion for all known time zones, so that by including the time zone as part of the format, there would be no problem. >> I think that we need to provide an operation in which the server returns a >> document only if it is later that a date/time supplied with the request. > This would be useful as part of a replication system, as long as both ends > exchanged timestamps initially so that the dates can be synchronised. In this case we need to define how servers should process date/time info, particularly when a mismatch is detected. ... talking about copyright protection > It may be stating the obvious, but once you allow a user to access you > data such that they can save it, there is no technical way you can prevent > them from publically redistributing your data. This is a social/legal > problem, not a technical one. Accepting that nothing can be done to stop > deliberate abuse of licensed information, there is a need to prevent > accidental abuse. There is no *techical way* to stop me driving my car at a passerby and killing him! The answer is that it is illegal to breach the copyright law. In HP we have notices next to each photocopier, reminding us of what the law allows us to do. The same will apply to networked access. Publishers are concerned that they receive fair payment for their information, and the critical issue is to ensure that all processes can pass an audit to show their compliance. My idea is that its ok to cache copyrighted docs so long as you put in an effective mechanism for logging and handling payments. This mechanism must be able to pass a suitable audit procedure. I believe that the scheme I described would do this. > Probably the simplest way to do this is to mark the > document as one which should NOT be cached. You need to separate the issue of copyright protection from ensuring secure access to restricted information. I proposed the "Distribution:" header for this purpose. Many thanks for your comments, Best wishes, Dave Raggett, dsr@hplb.hpl.hp.com